DISCLAIMER: This notebook is from what I was able to teach myself about Plotly in a relatively short time. Thus, anything I wrote or coded here may not be wholly accurate, thoroughly descriptive, perfectly optimal or efficient, or in line with standard Plotly syntax and practices of experienced users. It's an extremely sophisticated and in-depth library, so I suggest that in addition to this brief tutorial, everyone look at the official documentation and/or watch a few YouTube tutorials to get a better understanding of how to use it.¶

Installation¶

It's probably best to create a new conda environment for the hackathon, and you'll want to install the following packages into that environment.

Matplotlib and Seaborn are included in the list as two alternatives for standard Python plotting libraries, but they aren't strictly necessary to be able to run the code in this notebook (save for the scatter matrix example cell that imports a dataset from Seaborn)

In [ ]:
# conda create -n hackathon python=3.11
# conda install numpy
# conda install pandas
# conda install plotly  
# conda install nbformat
# conda install orjson
# pip install kaleido
# conda install jupyter
# conda install ipykernel
# pip install dash
# conda install seaborn
# conda install matplotlib

Imports¶

In [ ]:
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
pio.renderers.default='notebook'

# import matplotlib.pyplot as plt
# %matplotlib inline

What is Plotly?¶

Plotly is an open source library built with Javascript that has support for multiple different languages, including Python and R. It allows for fast and easy creation of over 40 types of INTERACTIVE plots, allowing you increase the amount of information a plot can display while simultaneously simplifying how its created. An interactive plot is not static, and can instead change in its layout, styling, graph type, and visualized data subset when using your cursor to interact with it.

Main page: https://plotly.com/python/

API documentation: https://plotly.com/python-api-reference/

Official forum for questions: https://community.plotly.com/c/plotly-python/

Great tutorial on Plotly for Python: https://www.youtube.com/watch?v=GGL6U0k8WYA&t=2111s

Below is a great demonstration of what Plotly can allow you to create in just a single line of code

In [ ]:
data = px.data.gapminder()
px.scatter(data , x="gdpPercap", y="lifeExp", animation_frame="year", 
           animation_group="country",
           size="pop", color="continent", hover_name="country",
           log_x=True, size_max=55, range_x=[100,100000], range_y=[25,90])

Dash¶

Dash is a web application framework for Python that allows you to create interactive web applications. It's built on top of Plotly, so you can use all of the Plotly functionality you've learned here for Dash app development. Dash plots tends to allow for far more complex interactivity than Plotly-only plots, as you can more easily combine multiple features like sliders and buttons (discussed a little later on) to work in tandem due to its dynamic "callback" capabilities. You can even have plots that dynamically adjust their content based on point(s) you've selected in another plot.

IF YOU'RE BRAVE ENOUGH to learn about Dash in addition to already learning about Plotly... I've made have a short tutorial on Dash that breaks down the following code if you're interested in potentially incorporating Dash into your hackathon project.

In [ ]:
from dash import Dash, html, dash_table, dcc, callback, Output, Input

external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

app = Dash(__name__, external_stylesheets=external_stylesheets)

df = pd.read_csv('https://plotly.github.io/datasets/country_indicators.csv')

app.layout = html.Div([
    # dash_table.DataTable(data=df.to_dict('records'), page_size=6),
    dash_table.DataTable(
        data=df.to_dict('records'), 
        columns=[{'name': i, 'id': i} for i in df.columns],
        page_size=6,
        filter_action="native",
        style_table={'overflowX': 'auto'},
        style_cell={'textAlign': 'left'}
    ),
    html.Div([
        html.Div([
            dcc.Dropdown(
                options=df['Indicator Name'].unique(),
                value='Fertility rate, total (births per woman)',
                id='crossfilter-xaxis-column',
            ),
            dcc.RadioItems(
                options=['Linear', 'Log'],
                value='Linear',
                id='crossfilter-xaxis-type',
                labelStyle={'display': 'inline-block', 'marginTop': '5px'}
            )
        ],
        style={'width': '49%', 'display': 'inline-block'}),

        html.Div([
            dcc.Dropdown(
                options=df['Indicator Name'].unique(),
                value='Life expectancy at birth, total (years)',
                id='crossfilter-yaxis-column'
            ),
            dcc.RadioItems(
                options=['Linear', 'Log'],
                value='Linear',
                id='crossfilter-yaxis-type',
                labelStyle={'display': 'inline-block', 'marginTop': '5px'}
            )
        ], style={'width': '49%', 'float': 'right', 'display': 'inline-block'})
        
    ], style={
        'padding': '10px 5px'
    }),

    html.Div([
        dcc.Graph(
            id='crossfilter-indicator-scatter',
            hoverData={'points': [{'customdata': 'Japan'}]}
        )
    ], style={'width': '49%', 'display': 'inline-block', 'padding': '0 20'}),
    
    html.Div([
        dcc.Graph(id='x-time-series'),
        dcc.Graph(id='y-time-series'),
    ], style={'display': 'inline-block', 'width': '49%'}),

    html.Div([
        html.Div([
            html.Label('Year:', style={'color': 'red'}),
            dcc.Slider(
                df['Year'].min(),
                df['Year'].max(),
                step=None,
                id='crossfilter-year--slider',
                value=df['Year'].max(),
                marks={str(year): str(year) for year in df['Year'].unique()},
                tooltip={"placement": "bottom", "always_visible": True}
            )
        ], style={'width': '48%', 'display': 'inline-block', 'padding': '0px 10px 20px 10px'}),
        
        html.Div([
            html.Label('Year Range Slider:', style={'color': 'green'}),
            dcc.RangeSlider(
                df['Year'].min(),
                df['Year'].max(),
                step=1,
                id='crossfilter-year-range--slider',
                value=[df['Year'].min(), df['Year'].max()],
                marks={str(year): str(year) for year in df['Year'].unique()},
                tooltip={"placement": "bottom", "always_visible": True}
            )
        ], style={'width': '48%', 'display': 'inline-block', 'padding': '0px 10px 20px 10px'})
    ])
])


def create_time_series(dff, axis_type, title):
    fig = px.scatter(dff, x='Year', y='Value')
    fig.update_traces(mode='lines+markers')
    fig.update_xaxes(showgrid=False)
    fig.update_yaxes(type='linear' if axis_type == 'Linear' else 'log') 
    fig.add_annotation(x=0, y=0.85, xanchor='left', yanchor='bottom',
                       xref='paper', yref='paper', showarrow=False, align='left',
                       text=title)
    fig.update_layout(height=225, margin={'l': 20, 'b': 30, 'r': 10, 't': 10})
    return fig


@callback(
    Output(component_id='crossfilter-indicator-scatter', component_property='figure'),
    Input(component_id='crossfilter-xaxis-column', component_property='value'),
    Input(component_id='crossfilter-yaxis-column', component_property='value'),
    Input(component_id='crossfilter-xaxis-type', component_property='value'),
    Input(component_id='crossfilter-yaxis-type', component_property='value'),
    Input(component_id='crossfilter-year--slider', component_property='value'))
def update_graph(xaxis_column_name, #crossfilter-xaxis-column
                 yaxis_column_name, #crossfilter-yaxis-column
                 xaxis_type, #crossfilter-xaxis-type
                 yaxis_type, #crossfilter-yaxis-type
                 year_value #crossfilter-year--slider
                 ):
    dff = df[df['Year'] == year_value]
    fig = px.scatter(x=dff[dff['Indicator Name'] == xaxis_column_name]['Value'],
            y=dff[dff['Indicator Name'] == yaxis_column_name]['Value'],
            hover_name=dff[dff['Indicator Name'] == yaxis_column_name]['Country Name']
            )
    fig.update_traces(customdata=dff[dff['Indicator Name'] == yaxis_column_name]['Country Name'])
    fig.update_xaxes(title=xaxis_column_name, type='linear' if xaxis_type == 'Linear' else 'log')
    fig.update_yaxes(title=yaxis_column_name, type='linear' if yaxis_type == 'Linear' else 'log')
    fig.update_layout(margin={'l': 40, 'b': 40, 't': 10, 'r': 0}, hovermode='closest')
    return fig


@callback(
    Output(component_id='x-time-series', component_property='figure'),
    Input(component_id='crossfilter-indicator-scatter', component_property='hoverData'),
    Input(component_id='crossfilter-xaxis-column', component_property='value'),
    Input(component_id='crossfilter-xaxis-type', component_property='value'),
    Input(component_id='crossfilter-year-range--slider', component_property='value'))
def update_x_timeseries(hoverData, xaxis_column_name, axis_type, year_range):
    country_name = hoverData['points'][0]['customdata']
    dff = df[(df['Country Name'] == country_name) & (df['Indicator Name'] == xaxis_column_name)]
    dff = dff[(dff['Year'] >= year_range[0]) & (dff['Year'] <= year_range[1])]
    title = '<b>{}</b><br>{}'.format(country_name, xaxis_column_name)
    return create_time_series(dff, axis_type, title)


@callback(
    Output(component_id='y-time-series', component_property='figure'),
    Input(component_id='crossfilter-indicator-scatter', component_property='hoverData'),
    Input(component_id='crossfilter-yaxis-column', component_property='value'),
    Input(component_id='crossfilter-yaxis-type', component_property='value'),
    Input(component_id='crossfilter-year-range--slider', component_property='value'))
def update_y_timeseries(hoverData, yaxis_column_name, axis_type, year_range):
    country_name = hoverData['points'][0]['customdata']
    dff = df[(df['Country Name'] == country_name) & (df['Indicator Name'] == yaxis_column_name)]
    dff = dff[(dff['Year'] >= year_range[0]) & (dff['Year'] <= year_range[1])]
    return create_time_series(dff, axis_type, yaxis_column_name)


if __name__ == '__main__':
    port = 8052
    print(f"Dash app running on http://127.0.0.1:{port}/")
    app.run(debug=True, port=port)
Dash app running on http://127.0.0.1:8052/

Not only are Plotly plots phenomenal for personal visualization in your code editor, they can also saved saved as standalone HTML files or published to web applications using Dash for sharing with others. Plots can also be saved in static formats like PNG, SVG, and PDF if you need to, but lose their interactive nature when doing so.

Additionally, Plotly plots are just JSON objects under the hood, making them easily storable and loadable, and thus able to be easily shared or moved between languages like Python and R. For example, here's how you can convert a Plotly plot to a JSON format, and then feed it right back in to Plotly to recreate the plot:

In [ ]:
# Create a basic bar chart and convert it to a JSON format
fig = px.bar(x=[1, 2, 3], y=[1, 3, 2])
as_json = fig.to_json()
print(f'Here\'s a peek at what a Plotly figure looks like when converted to a JSON format:')
print(f'{str(as_json)[:250]}...')

# Convert the JSON-serializable format back to a Plotly figure
import plotly.io as pio
as_dataframe = pio.from_json(as_json)
fig_from_json = go.Figure(as_dataframe)
fig_from_json.show()
Here's a peek at what a Plotly figure looks like when converted to a JSON format:
{"data":[{"alignmentgroup":"True","hovertemplate":"x=%{x}\u003cbr\u003ey=%{y}\u003cextra\u003e\u003c\u002fextra\u003e","legendgroup":"","marker":{"color":"#636efa","pattern":{"shape":""}},"name":"","offsetgroup":"","orientation":"v","showlegend":fals...

Plotly dataframes and built-in datasets¶

Plotly has numerous different built in datasets, which are great for testing out different plot types and learning how to use Plotly. You can see the full list of datasets with their descriptions here: https://plotly.com/python-api-reference/generated/plotly.data.html

Here's an example dataset from Plotly called "gapminder". This dataset contains data on life expectancy, GDP per capita, and population of 142 countries from 1952 to 2007.

In [ ]:
df = px.data.gapminder()
df.head()
Out[ ]:
country continent year lifeExp pop gdpPercap iso_alpha iso_num
0 Afghanistan Asia 1952 28.801 8425333 779.445314 AFG 4
1 Afghanistan Asia 1957 30.332 9240934 820.853030 AFG 4
2 Afghanistan Asia 1962 31.997 10267083 853.100710 AFG 4
3 Afghanistan Asia 1967 34.020 11537966 836.197138 AFG 4
4 Afghanistan Asia 1972 36.088 13079460 739.981106 AFG 4

When reading in these Plotly datasets using "px.data.{dataset name}", the datasets are automatically read in as pandas dataframes. This is great because it means that you can use all of the pandas functions to manipulate the data before plotting it.

Pandas is a fast, powerful, flexible, and easy to use open-source data analysis and data manipulation library built on top of the Python programming language. It is used for data manipulation, data cleaning, data analysis, and data visualization.

Express VS Graph Objects¶

Plotly has two different major methods to create plots, one using the "express" module and the other using the "graph_objects" module.

Express (px)¶

The express module is the higher-level interface for creating figures, and is easier and recommended for most users, but does have fewer customizability options. Here's a link to the API for the express module: https://plotly.com/python-api-reference/plotly.express.html

Graph Objects (go)¶

The graph_objects module is a lower-level interface for creating figures. It's more complicated to use and requires many more lines of code, but ultimately allows for more customization. Here's a link to the API for the graph objects module: https://plotly.com/python-api-reference/plotly.graph_objects.html

Express¶

Express syntax is much more concise and straightforward, as you can create a plot with just one line of code by calling px.{graph type} and feeding it your data. Here's an example of a basic line plot using the express module

In [ ]:
import plotly.express as px
df = px.data.gapminder()
fig = px.line(df, x = 'year', y = 'gdpPercap', title = 'GDP per Capita Over Time')
fig.show()

Graph Objects¶

Graph options syntax is more complicated and lengthy, but allows for more customization. Here's an example of a the same line plot from the previous cell, but using the graph objects module. Here, we are creating a figure object, and then adding a scatter plot trace to it with add_trace(). A trace is simply just any type of plot, such as a scatter plot, line plot, or bar plot. Plotly supports easily layering multiple plot types on top of each other, so for example here, we can add in a bar chart as well as a second trace. We then modify the layout of the figure object using update_layout() to add a title, x-axis label, and y-axis label.

In [ ]:
import plotly.graph_objects as go
df = px.data.gapminder()
fig = go.Figure()
fig.add_trace(go.Scatter(x = df['year'], y = df['gdpPercap'], 
                         mode = 'lines', name = 'GDP per Capita'),)
fig.add_trace(go.Bar(x=df['year'], y=df['gdpPercap'], name='GDP per Capita'))
fig.update_layout(title='GDP Per Capita Over Time',
                  xaxis_title='Year',
                  yaxis_title='gdpPercap')
fig.show()

Mixing Express and Graph Objects¶

If you start with express functions, you can also add in graph objects syntax afterwards on top of it. Here's an example of creating a line plot using express for the first trace, and then using graph objects to add in a bar chart for the second trace. You can additionally then update layout elements using the update_layout() function.

In [ ]:
df = px.data.gapminder()
fig = px.line(df, x = 'year', y = 'gdpPercap', title = 'GDP per Capita Over Time')

fig.add_trace(go.Bar(x=df['year'], y=df['gdpPercap'], name='GDP per Capita'))
fig.update_layout(plot_bgcolor='gray')
fig.show()

Plotly's interactivity features¶

Plots shown using "fig.show()" are interactive at a basic level by default when using a supported code editor (which primarily includes Jupyter Notebook, JupyterLab, and Visual Studio Code from what I know)

Toolbar¶

Using the icons in the upper right toolbar of the plot, you can...

  • Hover over points to see basic information about them
  • Click on legend items to hide or show their corresponding data/trace on the plot
  • Zoom in and out by clicking and dragging to specify a rectangle you want to zoom into
  • Pan around the graph
  • Reset the plot back to normal if you altered the viewing window
  • Autoscale the axes
  • Save the plot as a PNG file.

Here's an example of a basic interactive plot using Plotly Express

In [ ]:
df_stocks = px.data.stocks()
print('All data columns:')
print(df_stocks.columns)
px.line(df_stocks, x='date', y=['GOOG','AAPL'],
        labels={'x':'Date', 'y':'Price'}, title='Apple Vs. Google')
All data columns:
Index(['date', 'GOOG', 'AAPL', 'AMZN', 'FB', 'NFLX', 'MSFT'], dtype='object')

Plotly is also amazing for view 3D plots, as you can rotate the plot to view it from different angles. Here's an example of a 3D scatter plot using Plotly express

In [ ]:
flights = sns.load_dataset("flights")
fig = px.scatter_3d(flights, x='year', y='month', z='passengers', color='year',
                   opacity=0.7, width=800, height=400)
fig.show()

Hover Window/Text¶

https://plotly.com/python/hover-text-and-formatting/

An amazing feature of Plotly is that you can hover over individual data points to get additional information. At the basic level, the hover window will give you information about a point's exact x and y values. However, you can also have the hover window display information about a data point's other values that aren't being directly plotted. In the figure below showing sepal measurements for a variety of different Iris flowers, the hover window for any particular flower has been configured to display additional information about it's petal measurements as well.

If using express, you can specify what additional information you want to display in the hover window using "hover_data" and passing in the column names from your dataframe that you want.

In [ ]:
df_iris = px.data.iris()
print('All data columns:')
print(df_iris.columns)
fig = px.scatter(df_iris, x="sepal_width", y="sepal_length", color="species",
                 size='petal_length', hover_data=['petal_width', 'species_id'])
fig.show()
All data columns:
Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species',
       'species_id'],
      dtype='object')

If using graph objects, it's a bit more involved but far more customizable, instead using the "customdata" and "hovertemplate" arguments.

In [ ]:
df_iris = px.data.iris()
fig = go.Figure()
species = df_iris['species'].unique()
for sp in species:
    df_subset = df_iris[df_iris['species'] == sp]
    fig.add_trace(go.Scatter(
        x=df_subset['sepal_width'],
        y=df_subset['sepal_length'],
        mode='markers',
        marker=dict(size=df_subset['petal_length'] * 5),
        name=sp,
        customdata=df_subset[['petal_width', 'species_id']],
        hovertemplate=(
            "Sepal Width: %{x}<br>"
            "Sepal Length: %{y}<br>"
            "Sepal Length: %{marker.size}<br>"
            "Petal Width: %{customdata[0]}<br>"
            "Species ID: %{customdata[1]}"
        )
    ))
fig.update_layout(
    title='Iris Dataset Scatter Plot',
    xaxis_title='Sepal Width',
    yaxis_title='Sepal Length',
    legend_title='Species'
)
fig.show()

Hovertemplate¶

No matter which method you use, you can use "hovertemplate" argument to specify custom formatting for how information in a hover window is displayed.

For example, you can use the "%{x}" and "%{y}" to display the x and y values of the data point you're hovering over, and can use "%{customdata[i]}" for the additional data values. If you're using express, the entries of "customdata" will be created from the columns you specify in "hover_data", and then the template is altered using the update_traces() function. If you're using graph objects, you can specify the entries of "customdata" directly in the trace, and then alter the template using the "hovertemplate" argument.

In [ ]:
df_iris = px.data.iris()
print('All data columns:')
print(df_iris.columns)
fig = px.scatter(df_iris, x="sepal_width", y="sepal_length", color="species",
                 size='petal_length', hover_data=['petal_width', 'species_id'])

custom_hover_template = '<br>CustomX: %{x} <br>CustomY: %{y} <br>CustomAddon1: %{customdata[0]} <br>CustomAddon2: %{customdata[1]}'
fig.update_traces(hovertemplate = custom_hover_template)
All data columns:
Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species',
       'species_id'],
      dtype='object')

Sliders, Buttons/Dropdowns, and Range Selectors¶

Sliders, buttons/dropdown, and range sliders are the other major interactive features that can be added to Plotly plots. They allow you to change aspects of the plot like the focal subset of the data, layout and styling elements, graph style, and viewing range for a selectable option value that you can interactively specify.

For both sliders and dropdowns, the first key argument to create the interactivity is the "method" argument inside the "steps" list. Specifically, the "restyle" method specifies that the data and style attributes of the plot's traces should be updated when a new slider or dropdown option is selected. There are also the "relayout" and "update" methods that can be chosen. "relayout" specifies that the layout and formatting options should be updated when a new slider or dropdown option is selected, and the "update" method is a combination of both "restyle" and "relayout", i.e. change data and layout aspects.

The second key argument is the "args" argument, which specifies what the new chosen option should do in terms of altering the data subset or plot layout. Lastly, the "label" argument allows you to control what the buttons or slider ticks are labeled as.

Buttons/Dropdown Menus (officially called "Buttons")¶

https://plotly.com/python/custom-buttons/

Here's a simple example of a plot with a dropdown menu that allows you to specify data for countries from a particular continent be shown, allowing you to observe how life expectancy differed between the countries of different continents in 2007.

In [ ]:
df = px.data.gapminder()
start_continent = 'Asia'
focal_year = 2007
fig = px.bar(df[df['continent'] == start_continent][df['year'] == focal_year], x='country', y='lifeExp', title=f'Life Expectancy by Continental Countries in {focal_year}')

continent_buttons = [
    {'method': 'restyle',
     'label': continent,
     'args': [{'y': [df[(df['continent'] == continent) & (df['year'] == focal_year)]['lifeExp']],
               'x': [df[(df['continent'] == continent) & (df['year'] == focal_year)]['country']]}]
    } for continent in df['continent'].unique()
]

fig.update_layout(updatemenus=[{'buttons': continent_buttons,}],)
fig.show()
/var/folders/zj/v13f714s1pd1qtkj86qgwt_h0000gq/T/ipykernel_46748/3325259144.py:4: UserWarning:

Boolean Series key will be reindexed to match DataFrame index.

Here's another example to show how the styling might also be changed, with of a plot with a dropdown menu that allows you to specify what color the line is, as well as whether a sine or cosine wave curve is shown.

Note that unlike the previous example which uses buttons to change the actual data subset being used, this example involves plotting both curves, but using buttons to toggle the visibility of either curve (see the "visible" arguments throughout). This is an alternative way to use buttons to change the data being shown.

In [ ]:
x = np.linspace(-2 * np.pi, 2 * np.pi, 1000)
df = pd.DataFrame({'x': x, 'y1': np.sin(x), 'y2': np.cos(x)})

fig = go.Figure()
fig.add_trace(go.Scatter(x=df['x'], y=df['y1'], mode='lines', name='A'))
fig.add_trace(go.Scatter(x=df['x'], y=df['y2'], mode='lines', name='B', visible=False))

fig.update_layout(
    title='Drop down menus - Styling',
    xaxis=dict(domain=[0.1, 1]),
    yaxis=dict(title='y'),
    updatemenus=[
        dict(
            y=0.8,
            buttons=[
                dict(
                    method='restyle',
                    args=['line.color', 'blue'],
                    label='Blue'
                ),
                dict(
                    method='restyle',
                    args=['line.color', 'red'],
                    label='Red'
                )
            ]
        ),
        dict(
            y=0.7,
            buttons=[
                dict(
                    method='restyle',
                    args=['visible', [True, False]],
                    label='Sin'
                ),
                dict(
                    method='restyle',
                    args=['visible', [False, True]],
                    label='Cos'
                )
            ]
        )
    ]
)
fig.show()

Sliders¶

https://plotly.com/python/sliders/

Here's a reversed example of a plot with a slider that allows you to specify data from a particular year be shown, allowing you to observe how life expectancy across Asian countries has changed over time.

In [ ]:
df = px.data.gapminder()
start_year = 1952
focal_continent = 'Asia'
fig = px.bar(df[df['year'] == start_year][df['continent'] == focal_continent], x='country', y='lifeExp', title=f'Life Expectancy by Country in {focal_continent} Over Time')


year_slider = [
    {'method': 'restyle',
     'label': str(year),
     'args': [{'y': [df[(df['year'] == year) & (df['continent'] == focal_continent)]['lifeExp']],
               'x': [df[(df['year'] == year) & (df['continent'] == focal_continent)]['country']]}]
    } for year in sorted(df['year'].unique())
]
fig.update_layout(sliders=[{'steps': year_slider,}])
fig.show()
/var/folders/zj/v13f714s1pd1qtkj86qgwt_h0000gq/T/ipykernel_97865/3869900986.py:4: UserWarning:

Boolean Series key will be reindexed to match DataFrame index.

Animations¶

Similar to sliders, animations allow you to observe how data changes across some variable value. Animations are generally more powerful and easier (since you can build them without having to use graph objects syntax) to use than sliders, as they automatically cycle through the data for you.

In [ ]:
df_cnt = px.data.gapminder()
fig = px.bar(df_cnt, x="continent", y="pop", hover_data=['country'], color="continent",
  animation_frame="year", animation_group="country", range_y=[0,4000000000])
fig.show()

Slider and Dropdown together¶

I've been lead to believe it may not be doable in base Plotly... You can add both, but they don't seem to work in tandem, and rather each update the plot separately when changed... But maybe you could be the one to figure out how! It is definitely doable using Dash though (see below).

In [ ]:
import dash
from dash import dcc, html, Input, Output

df = px.data.gapminder()

app = dash.Dash(__name__)
app.layout = html.Div([
    dcc.Dropdown(
        id='continent-dropdown',
        options=[{'label': i, 'value': i} for i in df['continent'].unique()],
        value='Asia'  # Default value
    ),
    dcc.Slider(
        id='year-slider',
        min=df['year'].min(),
        max=df['year'].max(),
        value=df['year'].min(),
        marks={str(year): str(year) for year in df['year'].unique()},
        step=None
    ),
    dcc.Graph(id='life-exp-plot')
])

@app.callback(
    Output('life-exp-plot', 'figure'),
    [Input('continent-dropdown', 'value'),
     Input('year-slider', 'value')]
)
def update_figure(selected_continent, selected_year):
    filtered_df = df[(df['continent'] == selected_continent) & (df['year'] == selected_year)]
    fig = px.bar(filtered_df, x='country', y='lifeExp', title='Life Expectancy by Country')
    return fig

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True, port=1051)

Range sliders/selectors¶

https://plotly.com/python/range-slider/

Range sliders are a bit different from regular sliders, as they allow you to specify a range of values to be shown in the plot. You can also add range buttons that allow you to quickly zoom in on a particular range of values. Here's an example of a plot showing Apple's stock prices over time with a range slider and range buttons that allows you to specify and zoom in on the price within a particular timeframe

In [ ]:
df = pd.read_csv(
    "https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv")
df.columns = [col.replace("AAPL.", "") for col in df.columns]

# Create figure
fig = go.Figure()
fig.add_trace(
    go.Scatter(x=list(df.Date), y=list(df.High)))

# Add range slider
fig.update_layout(
    title_text="Time series with range slider and selectors",
    xaxis=dict(
        rangeselector=dict(
            buttons=list([
                dict(count=1,
                     label="1m",
                     step="month",
                     stepmode="backward"),
                dict(count=6,
                     label="6m",
                     step="month",
                     stepmode="backward"),
                dict(count=1,
                     label="1y",
                     step="year",
                     stepmode="backward"),
                dict(count=1,
                     label="YTD",
                     step="year",
                     stepmode="todate"),
                dict(step="all")
            ])
        ),
        rangeslider=dict(
            visible=True
        ),
        type="date"
    )
)
fig.show()

Subplots¶

https://plotly.com/python/subplots/

You can create multi-panel plots in Plotly, allowing you to include multiple plots as subplots in a single figure. When making subplots, the express module is incredibly limited, and the usage of graph objects is typically necessary

Plotly Express¶

Plotly Express has a simple way to create subplots using the "facet_col" and "facet_row" arguments. However, this method is limited in that you can you can only create subplots of a single type of plot (e.g. scatter plots, line plots, etc.) that involve the same data.

The below example shows how the total restaurant bill varies with the tip amount for different days of the week, for both men and women. The "facet_col" argument is used to create subplots for each day of the week, and the "facet_row" argument is used to create subplots for different times of the day (lunch vs dinner). You can also use the "category_orders" argument to specify the order of the subplot columns, like lunch being on the top row or bottom row, or days of the week out of order in the column order.

In [ ]:
df_tips = px.data.tips()
print('All data columns:')
print(df_tips.columns)
fig = px.histogram(df_tips, x="total_bill", y="tip", color="sex", facet_row="time", facet_col="day",
       category_orders={"day": ["Thur", "Sat", "Fri", "Sun"], "time": ["Lunch", "Dinner"]})
fig.show()
All data columns:
Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')

Plotly Graph Objects¶

Using graph objects to create subplots in Plotly is preferred when trying to create subplots that may contain different types of plots (e.g. scatter plots and line plots) or plots of different datasets. Each subplot is added as a trace to the figure object, and you can specify which row and column it should be placed at. The layout options for the whole figure can then be modified. This method also involves an additional Plotly module, "make_subplots", which is used to create the figure object with the desired number of rows and columns for the subplots.

In [ ]:
from plotly.subplots import make_subplots

# Initialize figure with subplots
fig = make_subplots(
    rows=2, cols=2, subplot_titles=("Plot 1", "Plot 2", "Plot 3", "Plot 4"))
# Add traces
fig.add_trace(go.Scatter(x=[1, 2, 3], y=[4, 5, 6]), row=1, col=1)
fig.add_trace(go.Scatter(x=[20, 30, 40], y=[50, 60, 70]), row=1, col=2)
fig.add_trace(go.Scatter(x=[300, 400, 500], y=[600, 700, 800]), row=2, col=1)
fig.add_trace(go.Scatter(x=[4000, 5000, 6000], y=[7000, 8000, 9000]), row=2, col=2)

# Update xaxis properties
fig.update_xaxes(title_text="xaxis 1 title", row=1, col=1)
fig.update_xaxes(title_text="xaxis 2 title", range=[10, 50], row=1, col=2)
fig.update_xaxes(title_text="xaxis 3 title", showgrid=False, row=2, col=1)
fig.update_xaxes(title_text="xaxis 4 title", type="log", row=2, col=2)

# Update yaxis properties
fig.update_yaxes(title_text="yaxis 1 title", row=1, col=1)
fig.update_yaxes(title_text="yaxis 2 title", range=[40, 80], row=1, col=2)
fig.update_yaxes(title_text="yaxis 3 title", showgrid=False, row=2, col=1)
fig.update_yaxes(title_text="yaxis 4 title", row=2, col=2)

# Update title and height
fig.update_layout(title_text="Customizing Subplot Axes", height=700)
fig.show()

Scatter Matrix¶

While a scatter matrix doesn't involve combining subplots into a single figure in the traditional sense, it's a great way to visualize relationships between multiple variables in a single plot, specifically showing each i x j variable combination as a scatterplot in a matrix of scatterplots. Here's an example of a scatter matrix for a bunch of flight data from a Seaborn built-in dataset, where points are colored by month.

In [ ]:
flights = sns.load_dataset('flights')
print('All data columns:')
print(flights.columns)
fig = px.scatter_matrix(flights, color='month')
fig.show()
All data columns:
Index(['year', 'month', 'passengers'], dtype='object')

Outputting Plotly plots¶

Plotly plots can be saved as standalone HTML files, which can be shared with others or embedded in web applications. You can also save plots as PNG, SVG, PDF, a JSON file like mentioned earlier, and... probably a bunch of other things.

In [ ]:
df = px.data.gapminder()
fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", 
           animation_group="country",
           size="pop", color="continent", hover_name="country",
           log_x=True, size_max=55, range_x=[100,100000], range_y=[25,90])

# Save the plot as a PNG
fig.write_image('plot.png')

# Save the plot as a PDF
fig.write_image('plot.pdf')

# Save the plot as a SVG file
fig.write_image('plot.svg')

# Save the plot as an HTML file
fig.write_html('plot.html')

# Save the plot as a JSON file
fig.write_json('plot.json')

Example plots and customization options¶

NOTE! All credit for the code of the following plots (some of which were also used in the text above) goes to the the YouTube creator Derek Banas, and are taken from a Jupyer notebook "cheat sheet" he posted on his GitHub (https://github.com/derekbanas/plotly-tutorial) in conjunction with his YouTube video "Plotly Tutorial 2023" (https://www.youtube.com/watch?v=GGL6U0k8WYA&t=2111s).

The video tutorial is essentially him building all of the following plots from scratch. It's insanely comprehensive and well done, and I highly recommend watching it if you want to learn more about Plotly, especially if you want to learn more about the many visual and layout customization options Plotly has that I've largely glossed over. With such amazing examples already in existence already, I could see no reason to try and create my own (assuming I even could!), and have just opted to copy the code for these plots from his notebook and put it all into this notebook.

Line Plots¶

In [ ]:
# Allows us to create graph objects for making more customized plots
import plotly.graph_objects as go

# Use included Google price data to make one plot
df_stocks = px.data.stocks()
px.line(df_stocks, x='date', y='GOOG', labels={'x':'Date', 'y':'Price'})
In [ ]:
# Make multiple line plots
px.line(df_stocks, x='date', y=['GOOG','AAPL'], labels={'x':'Date', 'y':'Price'},
       title='Apple Vs. Google')
In [ ]:
# Create a figure to which I'll add plots
fig = go.Figure()
# You can pull individual columns of data from the dataset and use markers or not
fig.add_trace(go.Scatter(x=df_stocks.date, y=df_stocks.AAPL, 
                        mode='lines', name='Apple'))
fig.add_trace(go.Scatter(x=df_stocks.date, y=df_stocks.AMZN, 
                        mode='lines+markers', name='Amazon'))
# You can create custom lines (Dashes : dash, dot, dashdot)
fig.add_trace(go.Scatter(x=df_stocks.date, y=df_stocks.GOOG, 
                        mode='lines+markers', name='Google',
                        line=dict(color='firebrick', width=2, dash='dashdot')))
# Further style the figure
# fig.update_layout(title='Stock Price Data 2018 - 2020',
#                    xaxis_title='Price', yaxis_title='Date')

# Go crazy styling the figure
fig.update_layout(
    # Shows gray line without grid, styling fonts, linewidths and more
    xaxis=dict(
        showline=True,
        showgrid=False,
        showticklabels=True,
        linecolor='rgb(204, 204, 204)',
        linewidth=2,
        ticks='outside',
        tickfont=dict(
            family='Arial',
            size=12,
            color='rgb(82, 82, 82)',
        ),
    ),
    # Turn off everything on y axis
    yaxis=dict(
        showgrid=False,
        zeroline=False,
        showline=False,
        showticklabels=False,
    ),
    autosize=False,
    margin=dict(
        autoexpand=False,
        l=100,
        r=20,
        t=110,
    ),
    showlegend=False,
    plot_bgcolor='white'
)

Bar Charts¶

In [ ]:
# Get population change in US by querying for US data
df_us = px.data.gapminder().query("country == 'United States'")
px.bar(df_us, x='year', y='pop')
In [ ]:
# Create a stacked bar with more customization
df_tips = px.data.tips()
px.bar(df_tips, x='day', y='tip', color='sex', title='Tips by Sex on Each Day',
      labels={'tip': 'Tip Amount', 'day': 'Day of the Week'})
In [ ]:
# Place bars next to each other
px.bar(df_tips, x="sex", y="total_bill",
             color='smoker', barmode='group')
In [ ]:
 
In [ ]:
# Display pop data for countries in Europe in 2007 greater than 2000000
df_europe = px.data.gapminder().query("continent == 'Europe' and year == 2007 and pop > 2.e6")
fig = px.bar(df_europe, y='pop', x='country', text='pop', color='country')
# Put bar total value above bars with 2 values of precision
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
# Set fontsize and uniformtext_mode='hide' says to hide the text if it won't fit
fig.update_layout(uniformtext_minsize=8)
# Rotate labels 45 degrees
fig.update_layout(xaxis_tickangle=-45)

Scatter Plot¶

In [ ]:
# Use included Iris data set
df_iris = px.data.iris()
# Create a scatter plot by defining x, y, different color for count of provided
# column, size based on supplied column and additional data to display on hover
px.scatter(df_iris, x="sepal_width", y="sepal_length", color="species",
                 size='petal_length', hover_data=['petal_width'])
In [ ]:
# Create a customized scatter with black marker edges with line width 2, opaque
# and colored based on width. Also show a scale on the right
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=df_iris.sepal_width, y=df_iris.sepal_length,
    mode='markers',
    marker_color=df_iris.sepal_width,
    text=df_iris.species,
    marker=dict(showscale=True)
))
In [ ]:
fig.update_traces(marker_line_width=2, marker_size=10)

# Working with a lot of data use Scattergl
fig = go.Figure(data=go.Scattergl(
    x = np.random.randn(100000),
    y = np.random.randn(100000),
    mode='markers',
    marker=dict(
        color=np.random.randn(100000),
        colorscale='Viridis',
        line_width=1
    )
))
fig

Pie Charts¶

In [ ]:
# Create Pie chart of the largest nations in Asia
# Color maps here plotly.com/python/builtin-colorscales/
df_samer = px.data.gapminder().query("year == 2007").query("continent == 'Asia'")
px.pie(df_samer, values='pop', names='country', 
       title='Population of Asian continent', 
       color_discrete_sequence=px.colors.sequential.RdBu)
In [ ]:
# Customize pie chart
colors = ['blue', 'green', 'black', 'purple', 'red', 'brown']
fig = go.Figure(data=[go.Pie(labels=['Water','Grass','Normal','Psychic', 'Fire', 'Ground'], 
                       values=[110,90,80,80,70,60])])
# Define hover info, text size, pull amount for each pie slice, and stroke
fig.update_traces(hoverinfo='label+percent', textfont_size=20,
                  textinfo='label+percent', pull=[0.1, 0, 0.2, 0, 0, 0],
                  marker=dict(colors=colors, line=dict(color='#FFFFFF', width=2)))

Histograms¶

In [ ]:
# Plot histogram based on rolling 2 dice
dice_1 = np.random.randint(1,7,5000)
dice_2 = np.random.randint(1,7,5000)
dice_sum = dice_1 + dice_2
# bins represent the number of bars to make
# Can define x label, color, title
# marginal creates another plot (violin, box, rug)
fig = px.histogram(dice_sum, nbins=11, labels={'value':'Dice Roll'},
             title='5000 Dice Roll Histogram', marginal='violin',
            color_discrete_sequence=['green'])

fig.update_layout(
    xaxis_title_text='Dice Roll',
    yaxis_title_text='Dice Sum',
    bargap=0.2, showlegend=False
)
In [ ]:
# Stack histograms based on different column data
df_tips = px.data.tips()
px.histogram(df_tips, x="total_bill", color="sex")

Box Plots¶

In [ ]:
# A box plot allows you to compare different variables
# The box shows the quartiles of the data. The bar in the middle is the median 
# The whiskers extend to all the other data aside from the points that are considered
# to be outliers
df_tips = px.data.tips()
# We can see which sex tips the most, points displays all the data points
px.box(df_tips, x='sex', y='tip', points='all')
In [ ]:
# Display tip sex data by day
px.box(df_tips, x='day', y='tip', color='sex')
In [ ]:
# Adding standard deviation and mean
fig = go.Figure()
fig.add_trace(go.Box(x=df_tips.sex, y=df_tips.tip, marker_color='blue',
                    boxmean='sd'))
In [ ]:
# Complex Styling
df_stocks = px.data.stocks()
fig = go.Figure()
# Show all points, spread them so they don't overlap and change whisker width
fig.add_trace(go.Box(y=df_stocks.GOOG, boxpoints='all', name='Google',
                    fillcolor='blue', jitter=0.5, whiskerwidth=0.2))
fig.add_trace(go.Box(y=df_stocks.AAPL, boxpoints='all', name='Apple',
                    fillcolor='red', jitter=0.5, whiskerwidth=0.2))
# Change background / grid colors
fig.update_layout(title='Google vs. Apple', 
                  yaxis=dict(gridcolor='rgb(255, 255, 255)',
                 gridwidth=3),
                 paper_bgcolor='rgb(243, 243, 243)',
                 plot_bgcolor='rgb(243, 243, 243)')

Violin Plot¶

In [ ]:
# Violin Plot is a combination of the boxplot and KDE
# While a box plot corresponds to data points, the violin plot uses the KDE estimation
# of the data points
df_tips = px.data.tips()
px.violin(df_tips, y="total_bill", box=True, points='all')
In [ ]:
# Multiple plots
px.violin(df_tips, y="tip", x="smoker", color="sex", box=True, points="all",
          hover_data=df_tips.columns)
In [ ]:
# Morph left and right sides based on if the customer smokes
fig = go.Figure()
fig.add_trace(go.Violin(x=df_tips['day'][ df_tips['smoker'] == 'Yes' ],
                        y=df_tips['total_bill'][ df_tips['smoker'] == 'Yes' ],
                        legendgroup='Yes', scalegroup='Yes', name='Yes',
                        side='negative',
                        line_color='blue'))
fig.add_trace(go.Violin(x=df_tips['day'][ df_tips['smoker'] == 'No' ],
                        y=df_tips['total_bill'][ df_tips['smoker'] == 'No' ],
                        legendgroup='Yes', scalegroup='Yes', name='No',
                        side='positive',
                        line_color='red'))

Density Heatmap¶

In [ ]:
# Create a heatmap using Seaborn data
flights = sns.load_dataset("flights")
flights

# You can set bins with nbinsx and nbinsy
fig = px.density_heatmap(flights, x='year', y='month', z='passengers', 
                         color_continuous_scale="Viridis")
fig
In [ ]:
# You can add histograms
fig = px.density_heatmap(flights, x='year', y='month', z='passengers', 
                         marginal_x="histogram", marginal_y="histogram")
fig

3D Scatter Plots¶

In [ ]:
# Create a 3D scatter plot using flight data
flights = sns.load_dataset("flights")
fig = px.scatter_3d(flights, x='year', y='month', z='passengers', color='year',
                   opacity=0.7, width=800, height=400)
fig

3D Line Plots¶

In [ ]:
fig = px.line_3d(flights, x='year', y='month', z='passengers', color='year')
fig

Scatter Matrix¶

In [ ]:
# With a scatter matrix we can compare changes when comparing column data
fig = px.scatter_matrix(flights, color='month')
fig

Map Scatter Plots¶

In [ ]:
# There are many interesting ways of working with maps
# plotly.com/python-api-reference/generated/plotly.express.scatter_geo.html
df = px.data.gapminder().query("year == 2007")
fig = px.scatter_geo(df, locations="iso_alpha",
                     color="continent", # which column to use to set the color of markers
                     hover_name="country", # column added to hover information
                     size="pop", # size of markers
                     projection="orthographic")
fig

Choropleth Maps¶

In [ ]:
# You can color complex maps like we do here representing unemployment data

# Allows us to grab data from a supplied URL
from urllib.request import urlopen
# Used to decode JSON data
import json
# Grab US county geometry data
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

# Grab unemployment data based on each counties Federal Information Processing number
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/fips-unemp-16.csv",
                   dtype={"fips": str})

# Draw map using the county JSON data, color using unemployment values on a range of 12
fig = px.choropleth(df, geojson=counties, locations='fips', color='unemp',
                           color_continuous_scale="Viridis",
                           range_color=(0, 12),
                           scope="usa",
                           labels={'unemp':'unemployment rate'}
                          )
fig

Polar Chart¶

In [ ]:
# Polar charts display data radially 
# Let's plot wind data based on direction and frequency
# You can change size and auto-generate different symbols as well
df_wind = px.data.wind()
px.scatter_polar(df_wind, r="frequency", theta="direction", color="strength",
                size="frequency", symbol="strength")
In [ ]:
# Data can also be plotted using lines radially
# A template makes the data easier to see
px.line_polar(df_wind, r="frequency", theta="direction", color="strength",
                line_close=True, template="plotly_dark", width=800, height=400)

Ternary Plot¶

In [ ]:
# Used to represent ratios of 3 variables
df_exp = px.data.experiment()
px.scatter_ternary(df_exp, a="experiment_1", b="experiment_2", 
                   c='experiment_3', hover_name="group", color="gender")

Facets¶

In [ ]:
# You can create numerous subplots
df_tips = px.data.tips()
px.scatter(df_tips, x="total_bill", y="tip", color="smoker", facet_col="sex")
In [ ]:
# We can line up data in rows and columns
px.histogram(df_tips, x="total_bill", y="tip", color="sex", facet_row="time", facet_col="day",
       category_orders={"day": ["Thur", "Fri", "Sat", "Sun"], "time": ["Lunch", "Dinner"]})
In [ ]:
# This dataframe provides scores for different students based on the level
# of attention they could provide during testing
att_df = sns.load_dataset("attention")
fig = px.line(att_df, x='solutions', y='score', facet_col='subject',
             facet_col_wrap=5, title='Scores Based on Attention')
fig

Animated Plots¶

In [ ]:
# Create an animated plot that you can use to cycle through continent
# GDP & life expectancy changes
df_cnt = px.data.gapminder()
px.scatter(df_cnt, x="gdpPercap", y="lifeExp", animation_frame="year", 
           animation_group="country",
           size="pop", color="continent", hover_name="country",
           log_x=True, size_max=55, range_x=[100,100000], range_y=[25,90])
In [ ]:
# Watch as bars chart population changes
px.bar(df_cnt, x="continent", y="pop", color="continent",
  animation_frame="year", animation_group="country", range_y=[0,4000000000])